AITopics | convex clustering

Clustering approaches that utilize convex loss functions have recently attracted growing interest in the formation of compact data clusters. Although classical methods like k means and its wide family of variants are still widely used, all of them require the number of clusters (k) to be supplied as input and many are notably sensitive to initialization. Convex clustering provides a more stable alternative by formulating the clustering task as a convex optimization problem, ensuring a unique global solution. However, it faces challenges in handling high-dimensional data, especially in the presence of noise and outliers. Additionally, strong fusion regularization, controlled by the tuning parameter, can hinder effective cluster formation within a convex clustering framework. To overcome these challenges, we introduce a robust approach that integrates convex clustering with the Median of Means (MoM) estimator, thus developing an outlier-resistant and efficient clustering framework that does not necessitate a prior knowledge of the number of clusters. By leveraging the robustness of MoM alongside the stability of convex clustering, our method enhances both performance and efficiency, especially on large-scale datasets. Theoretical analysis demonstrates weak consistency under specific conditions, while experiments on synthetic and real-world datasets validate the method's superior performance compared to existing approaches. Clustering is a fundamental task in unsupervised learning, aiming to organize unlabeled data into coherent groups for better interpretation and downstream applications.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

2511.14784

Country: North America > United States (0.94)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights

Pan, Shubhayan, Chakraborty, Saptarshi, Paul, Debolina, Bose, Kushal, Das, Swagatam

arXiv.org Machine LearningNov-10-2025

Convex clustering is a well-regarded clustering method, resembling the similar centroid-based approach of Lloyd's $k$-means, without requiring a predefined cluster count. It starts with each data point as its centroid and iteratively merges them. Despite its advantages, this method can fail when dealing with data exhibiting linearly non-separable or non-convex structures. To mitigate the limitations, we propose a kernelized extension of the convex clustering method. This approach projects the data points into a Reproducing Kernel Hilbert Space (RKHS) using a feature map, enabling convex clustering in this transformed space. This kernelization not only allows for better handling of complex data distributions but also produces an embedding in a finite-dimensional vector space. We provide a comprehensive theoretical underpinnings for our kernelized approach, proving algorithmic convergence and establishing finite sample bounds for our estimates. The effectiveness of our method is demonstrated through extensive experiments on both synthetic and real-world datasets, showing superior performance compared to state-of-the-art clustering techniques. This work marks a significant advancement in the field, offering an effective solution for clustering in non-linear and non-convex data scenarios.

artificial intelligence, convex, machine learning, (18 more...)

arXiv.org Machine Learning

2511.05159

Country:

Asia > India > West Bengal > Kolkata (0.04)
North America > United States > Michigan (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Convex Clustering through MM: An Efficient Algorithm to Perform Hierarchical Clustering

Touw, Daniel J. W., Groenen, Patrick J. F., Terada, Yoshikazu

arXiv.org Machine LearningDec-21-2023

Convex clustering is a modern method with both hierarchical and $k$-means clustering characteristics. Although convex clustering can capture complex clustering structures hidden in data, the existing convex clustering algorithms are not scalable to large data sets with sample sizes greater than several thousands. Moreover, it is known that convex clustering sometimes fails to produce a complete hierarchical clustering structure. This issue arises if clusters split up or the minimum number of possible clusters is larger than the desired number of clusters. In this paper, we propose convex clustering through majorization-minimization (CCMM) -- an iterative algorithm that uses cluster fusions and a highly efficient updating scheme derived using diagonal majorization. Additionally, we explore different strategies to ensure that the hierarchical clustering structure terminates in a single cluster. With a current desktop computer, CCMM efficiently solves convex clustering problems featuring over one million objects in seven-dimensional space, achieving a solution time of 51 seconds on average.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

2211.01877

Country:

Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)
Europe > Netherlands > South Holland > Rotterdam (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry: Energy > Power Industry (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Convex Clustering with Exemplar-Based Models

Neural Information Processing SystemsApr-6-2023, 14:38:12 GMT

Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a dif- ferent approach to approximate mixture fitting for clustering.

Add feedback

Supervised Convex Clustering

Wang, Minjie, Yao, Tianyi, Allen, Genevera I.

arXiv.org Machine LearningMay-25-2020

Clustering has long been a popular unsupervised learning approach to identify groups of similar objects and discover patterns from unlabeled data in many applications. Yet, coming up with meaningful interpretations of the estimated clusters has often been challenging precisely due to its unsupervised nature. Meanwhile, in many real-world scenarios, there are some noisy supervising auxiliary variables, for instance, subjective diagnostic opinions, that are related to the observed heterogeneity of the unlabeled data. By leveraging information from both supervising auxiliary variables and unlabeled data, we seek to uncover more scientifically interpretable group structures that may be hidden by completely unsupervised analyses. In this work, we propose and develop a new statistical pattern discovery method named Supervised Convex Clustering (SCC) that borrows strength from both information sources and guides towards finding more interpretable patterns via a joint convex fusion penalty. We develop several extensions of SCC to integrate different types of supervising auxiliary variables, to adjust for additional covariates, and to find biclusters. We demonstrate the practical advantages of SCC through simulations and a case study on Alzheimer's Disease genomics. Specifically, we discover new candidate genes as well as new subtypes of Alzheimer's Disease that can potentially lead to better understanding of the underlying genetic mechanisms responsible for the observed heterogeneity of cognitive decline in older adults.

artificial intelligence, auxiliary variable, machine learning, (16 more...)

arXiv.org Machine Learning

2005.12198

Country:

North America > United States > Texas > Harris County > Houston (0.04)
Europe > Sweden > Östergötland County > Linköping (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Dynamic Visualization and Fast Computation for Convex Clustering via Algorithmic Regularization

Weylandt, Michael, Nagorski, John, Allen, Genevera I.

arXiv.org Machine LearningJan-10-2019

Convex clustering is a promising new approach to the classical problem of clustering, combining strong performance in empirical studies with rigorous theoretical foundations. Despite these advantages, convex clustering has not been widely adopted, due to its computationally intensive nature and its lack of compelling visualizations. To address these impediments, we introduce Algorithmic Regularization, an innovative technique for obtaining high-quality estimates of regularization paths using an iterative one-step approximation scheme. We justify our approach with a novel theoretical result, guaranteeing global convergence of the approximate path to the exact solution under easily-checked non-data-dependent assumptions. The application of algorithmic regularization to convex clustering yields the Convex Clustering via Algorithmic Regularization Paths (CARP) algorithm for computing the clustering solution path. On example data sets from genomics and text analysis, CARP delivers over a 100-fold speed-up over existing methods, while attaining a finer approximation grid than standard methods. Furthermore, CARP enables improved visualization of clustering solutions: the fine solution grid returned by CARP can be used to construct a convex clustering-based dendrogram, as well as forming the basis of a dynamic path-wise visualization based on modern web technologies. Our methods are implemented in the open-source R package clustRviz, available at https://github.com/DataSlingers/clustRviz.

algorithm, clustering, convex, (14 more...)

arXiv.org Machine Learning

1901.01477

Country:

North America > United States > Washington > King County > Bellevue (0.04)
North America > Canada > Quebec > Montreal (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Alpes-Maritimes > Nice (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Government > Regional Government > North America Government > United States Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.95)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

Sparse Convex Clustering

Wang, Binhuan, Zhang, Yilong, Sun, Will Wei, Fang, Yixin

arXiv.org Machine LearningFeb-10-2017

Convex clustering, a convex relaxation of k-means clustering and hierarchical clustering, has drawn recent attentions since it nicely addresses the instability issue of traditional nonconvex clustering methods. Although its computational and statistical properties have been recently studied, the performance of convex clustering has not yet been investigated in the high-dimensional clustering scenario, where the data contains a large number of features and many of them carry no information about the clustering structure. In this paper, we demonstrate that the performance of convex clustering could be distorted when the uninformative features are included in the clustering. To overcome it, we introduce a new clustering method, referred to as Sparse Convex Clustering, to simultaneously cluster observations and conduct feature selection. The key idea is to formulate convex clustering in a form of regularization, with an adaptive group-lasso penalty term on cluster centers. In order to optimally balance the tradeoff between the cluster fitting and sparsity, a tuning criterion based on clustering stability is developed. In theory, we provide an unbiased estimator for the degrees of freedom of the proposed sparse convex clustering method. Finally, the effectiveness of the sparse convex clustering is examined through a variety of numerical experiments and a real data application.

convex, selection, sparse convex, (16 more...)

arXiv.org Machine Learning

doi: 10.1080/10618600.2017.1377081

1601.04586

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Convex Clustering with Exemplar-Based Models

Lashkari, Danial, Golland, Polina

Neural Information Processing SystemsDec-31-2008

Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a different approach to approximate mixture fitting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efficient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering can be thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping.

algorithm, clustering, mixture model, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

Add feedback

Convex Clustering with Exemplar-Based Models

Lashkari, Danial, Golland, Polina

Neural Information Processing SystemsDec-31-2008

Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a different approach to approximate mixture fitting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efficient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering can be thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping.

algorithm, clustering, mixture model, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)

Add feedback

Convex Clustering with Exemplar-Based Models

Lashkari, Danial, Golland, Polina

Neural Information Processing SystemsDec-31-2008

Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a different approachto approximate mixture fitting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efficient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering canbe thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping.

Add feedback

Filters

Collaborating Authors

convex clustering

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Convex Clustering Redefined: Robust Learning with the Median of Means Estimator

A New Framework for Convex Clustering in Kernel Spaces: Finite Sample Bounds, Consistency and Performance Insights

Convex Clustering through MM: An Efficient Algorithm to Perform Hierarchical Clustering

Convex Clustering with Exemplar-Based Models

Supervised Convex Clustering

Dynamic Visualization and Fast Computation for Convex Clustering via Algorithmic Regularization

Sparse Convex Clustering

Convex Clustering with Exemplar-Based Models

Convex Clustering with Exemplar-Based Models

Convex Clustering with Exemplar-Based Models